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Abstract. Social Communities in bibliographic databases exist since 
many years, researchers share common research interests, and work and 
publish together. A social community may vary in type and size, being 
fully connected between participating members or even more expressed 
by a consortium of small and individual members who play individual 
roles in it. In this work, we focus on social communities inside the biblio- 
graphic database DBLP and characterize communities through a simple 
typifying description model. Generally, we understand a publication as a 
transaction between the associated authors. The idea therefore is to con- 
cern with directed associative relationships among them, to decompose 
each pattern to its fundamental structure, and to describe the commu- 
nities by expressive attributes. Finally, we argue that the decomposition 
supports the management of discovered structures towards the use of 
adaptive-incremental mind-maps. 



1 Bibliographic Libraries 

DBLP (jH], in]) is an online database system regarding bibliographic entries of 
scientific publications inside Computer Science. It maintains publications of sin- 
gle authors or a set of authors, it is public and offers a retrieval interface to query 
publication entries. Since March 2008, more than one million entries have been 
added to the database. The first publications are of 1936, the latest of 2008. An 
intelligent DBLP core gathers data from known conferences, collecting entries 
from electronic publications. Interpreting each publication as a data stream el- 
ement (the authors share some time together, converse, and produce a result), 
then this refers to similar problems and challenges given above. 

We have performed association discovery of DBLP entries throughout on a 
yearly basis for the last 72 years. This is because the data set has a certain size 
and is therefore more expressive than a data set on a monthly basis, but more 
precise than a less fine-granulated discretisation. The calculation of association 
rules has been done by using a sample in a way that we firstly set the item set 
frequency threshold to 0.1% and secondly the bayesian probability value threshold 
to 5%. Thirdly, since the number of received rules has still been tremendously 
high, we filtered some of them by taking the lift - representing the proportion of 
the bayesian probability and the statistically independent case of an association 
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rule A =^ B. In this respect, the number of scientific contributions had been 
downsized; we got a distribution curve having the pike in 1972. Although the 
publication of a scientific paper has a delay of a several months (submission date, 
evaluation period, publication), we counted each scientific paper to the year it 
has appeared. Following the yearly calculation of associative rules - beginning 
from 1936 and ending in 2007 - a various number of patterns occurred. 




Fig. 1. DBLP: Snapshot visualization of publishing communities (1994). 



Some of them are presented in Figure [T] showing the distribution of the 
publishing authors in the year of 1994. The publishing communities in 1994 
are mainly characterized by a central star consisting of many author nodes. 
Generally, the observations we have done, produced a diverse number patterns 
that are often quite similar and that base on very simple geometric structures. 
Most interestingly, patterns went away but appeared again, they stayed stable 
or disappeared forever. For example, the big star (Figure [ij has not been ex- 
isting before 1955, but appeared several times afterwards, for example in 1994, 
disappeared temporarily, and appeared again in 2006 {visiting pattern). Simple 
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structures like in Figure [2] are present continuously {constant pattern), without 
any temporal break: this sounds interesting in the first moment, but in fact, it 
simply reflects a certain kind of noise within the set of results. 

2 A Simple Typifying Model 

Following our observations, we typify each associative pattern to their funda- 
mental structure, and - since these structures are evocative of chemical basic 
modules - we label them in almost the same manner. 

Each author node i corresponds to an atomic author nucleus, owning a certain 
activation acti and a number of atomic bonds with other nuclei. In the follow- 
ing model description, we keep these bonds unvalued although the strengthen 
between the adjacent atomic author nuclei exists per se. 



a) 




Fig. 2. Single atomic bonds (left) and double atomic bonds (right) between two 
atomic author nucleus. 

As presented in Figure [2] we call such a connection a single atomic bond, as 
it describes the relationship between two author nuclei on the lowest level, being 
single-directed. Furthermore, we call a relationship a double atomic bond, if the 
relationship between two authors is undirected (see Figure [2]) in a sense that both 
the association between author nuclei A and B and B and A, respectively, are 
frequent. Both structures define the fundamental molecular structure between 
two adjacent nuclei. In case, that two nuclei share two single atomic bonds 
of different direction, we call this undirected relationship a bridge. All these 
molecules are called 2-ary since only two atoms are involved. 

As for this, any combination of atomic structures results in further molecules, 
being of different granularity and size. In general, we understand molecules as ex- 
pressive in respect to their arity. Some examples of 2-ary fundamental molecules 
are presented in Figure [3] On the left, an atomic nucleus A is being arranged 
as a centre of k adjacent nuclei. We call this molecule structure a molecule star, 
meaning that the inner nuclei is dependent on each adjacent nuclei: following the 
meaning of the bond, a publication of A has always been done under the condi- 
tion that someone else has published. In case that a author nucleus is connected 
to only disjunctive nuclei sharing no other bonds, then the author nucleus is still 
being 2-ary, otherwise n-ary. 

For Figure|3] we observe on the right side a molecule star is the opposite of the 
previous one as A is the originator of any publication, being the condition that 
others publish. With this, we differentiate between different roles of a nucleus: 

— An author nucleus A is a Trigger if A influences another author B: A ^ B 
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Fig. 3. 2-ary types of molecule stars of different roles; the last one is a combi- 
nation of both. 
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— An author nucleus A is a Reactor if A is is influenced by another author B: 



— Two author nuclei A and B define a bridge if they share a double atomic 



Some molecules structure representing a mixture of single and double atomic 
bonds are shown in Figure |4] The structures a) and d) are fully composed of 
double atomic bonds, whereas b) and c) share single atomic bonds as well. We call 
the molecular structure in a) a molecular diamond and the central sub-structure 
in c) a molecular bridge. The structure in b) is a mixture of a molecular star and 
a molecular diamond, the structure in d) four overlapping molecular diamonds. 
With this, molecular stars can be seen communities that consist of an arbitrary 
number of triggers and reactors; and a molecular diamond is nothing else than 
a composition of bridges. Furthermore, all molecules are still 2-ary. 

In this respect, a community is a set of author nuclei that are connected 
by molecular structures. For example, if A => S, then the nuclei A and B are 
coimected, where B is reachable by A. A collection of associations like A ^ B, 
B ^ C, and C ^ A therefore yields on a structure where each author nuclei can 
be reached by another one. Two disjunctive molecules define disjunctive social 
communities. 

In contract to this, a collection of associations like A ^ B and B ^ C yields 
on a stringent sequence from nucleus A to nucleus C where the A can not be 
reached by B and C (as an example, see Figure [t] with a molecular arrow and a 
molecular triangle). In the following, we will concern only with 2-ary molecular 
structures. 

3 Algorithmic Decomposition 

A decomposition is then as follows: let ai,aj, and disjunctive author nuclei 
with I <i^ j ^ k < s natural numbers, and STAR, BRIDGE, and DIAMOND 
some data structures to manage identified molecule structures. A reactor is de- 
fined as to be true (= 1) for a nucleus when an association to another nucleus 
exists, otherwise false (= 0): 



A trigger is defined as to be true for a nucleus when an association to another 
nucleus exists, otherwise false: 



B^ A 



bond. 




1 : 3aj with aj a, 
: else 




Furthermore, the single atomic bonds is a predicate that evaluates to true (= 
1), when exactly one directed connection exists between two author nuclei (i.e., 
the exclusive OR): 
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Fig. 4. Selected molecular forms with molecular diamond (above, left) and 
molecular mixture of star and diamond (above, rights), mixture with single and 
double atomic bonds and molecular bridge (below, left), double diamond (below, 
right). 
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Fig. 6. Decomposition to eight molecular stars with triggers and reactors and k 
molecular diamonds. 
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a) 



Fig. 7. Examples for 3-ary molecules. 
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sbond(aj, ttj) = 



1 : ttj — > ttj V aj 
: else 



A double atomic bond is a predicate evaluates to true (= 1), when single 
atomic bonds exist between both a,,Oj and aj,ai, respectively: 



dbond(ai, aj) = ^ 



1 : Ui ^ ttj A aj 
: else 



The predicate emptyp(aj) evaluates to true when aj shares no bonds (neither 
single nor double): 



emptyp(ai) 



1 : jBaj : a^ — > aj V aj 
: else 



Moreover, the decomposition to cither a molecular star, molecular bridge, or 
molecular diamond can be discovered by simple functions that base on the pre- 
viously defined predicates. In this respect, a molecular star exists for a number 
of author nuclei ai, . . . , a^ in case that the function STAR(ai) leads to a positive 
result. 

computeSTAR ( a_i ) 
for all 1 <= j <= s do: 

if ( a_j exists with sbondp(a_i, a_j) ) 
then add a_j to STAR (a_i) 

od; 

exit with STAR (a_i) 

STAR(ai) reads in an arbitrary author nucleus and exits with a complete list 
of single bonds associated with a,. The average computation time is polynomial. 
A molecular bridge exists, when the function BRIDGE(ai) is non-empty. 

computeBRIDGE (a_i) 
for all 1 <= j <= s do: 

if ( a_j exists with dbondp(a_j, a_i) ) 
then add a_j to BRIDGE (a_i) 

od; 

exit with BRIDGE(a_i) 

BRIDGE(ai) reads in an arbitrary author nucleus and exits with a list of 
bridges for each of them. As for STAR, the average computation time is poly- 
nomial. Finally, a molecular diamond exists when the function DIAMOND(ai) 
is non-empty, returning a list of author nuclei associating a^. 

computeDIAMOND (a_i) 

for all 1 <= j, k <= s do: 

if ( BRIDGE (a_i, a_j), BRIDGE (a_j , a_k) , BRIDGE (a_k, a_i) ) 
then add (a_j , a_k) to DIAMOND ( a_i ) 

od; 

exit with DIAMOND (a_i) 
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4 Social Networking 

With the defined predicates and functions we are then able to decompose molec- 
ular structures. In this sense, molecular stars can be seen communities that con- 
sist of an arbitrary number of triggers and reactors; and a molecular diamond 
is nothing else than a composition of bridges. Furthermore, a decomposition of 
molecular structures can then be performed quite easily, leaving to a number of 
descriptive attributes like shown in Table [T] 

4.1 Social Network Clustering with/without envelopes 

Figure |8] briefly displays a selection of molecular structures that have appeared 
over the years. For structure |8|l) , we notice a mixture of molecular diamonds 
and molecular stars, for structure [S]:) a mixture of molecular bridges and atomic 
bonds, and for structure [sji) three bridges only, concentrated on one node. All 
of the other structures are similar and decomposable. The different roles to be 
observed fully correspond to the ones given above: triggers (outgoing link) and 
reactors (incoming link). 




k) i; m) n) 



Fig. 8. Selected molecular structures that appear in visualizing associative com- 
munities. 

Using such a data table for clustering, we may then get groups of social 
sub-networks being similar. This is a simpliflcation of existing molecular com- 
munities. For example, while taking the raw attributes data SB (number of single 
bonds), BR (number of bridges), DI (number of diamonds), NU (number of nu- 
clei), RE (number of reactor nodes), and TR (number of triggers nodes) Q for 
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a hierarchical clustering, compute the distances between these attributes indi- 
vidually, we may group the examples following their similarities and therefore 
perform a clustering. 
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Table 1. Description of Molecular Structures from|3^) tojs];) and|4^) to|4|i). 



With this decomposition to n-ary molecules, we demand on decomposing each 
publishing community and to describe a publishing community by the molecular 
attributes. Applying such a data table containing a description for molecular 
structures with clustering, we may then get groups of molecular structures being 
similar. The advantage of such an analytical performance is a simplification of 
existing molecular communities in respect to their structure. 

4.2 Social Role Discovery 

The immediate identification of roles in social communities is shown in Figure 
[9j here, we may observe molecular diamonds and molecular stars, having Micha 
Sharir as molecular trigger for seven other authors. Furthermore, Carlos Sanchez 
is both a molecular trigger and a molecular reactor, whereas Eric Dubois, Phillipe 
Dubois, and Michael Petit form a molecular diamond. 



BeriunaChaHllE AlMiEfrrtl JuanC Yelmis JiiaiiJ„Cil MLclmllPELiL ^^^^^ EriuDulimK 




Fig. 9. Selected Social Communities in 1994. 
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A more complex scenario can be observed in |10[ which references to the big 
star. This big star is often present over many years, referencing a huge community 
sharing pubhcations. For this, this structure symbohzes fruitful years with strong 
interrelations and cooperations. 




Fig. 10. Social Community exampling the big star. 



We may observe that the big star is nothing else than a composition of many 
author nuclei in its inner centre, being connected to external nuclei. Moreover, 
we may observe these nuclei, being either a reactor, a trigger or both. Overall, the 
molecule is composed of different levels, representing an increasing importance, 
the more concentrated a nucleus is positioned. One nucleus in the core has much 
more a trigger function than a node outside, and the influence concerning a 
directing of research might be stronger, if we admit that this can be influenced 
by publications. 

4.3 Temporal Comparison of Social Networks 

Initially, we observe very simple molecules in the years before 1950, because 
less publications have been made. The first molecular bridge can be observed 
in 1953, the first more complex structure in 1954. The evolvement remembers 
to cell division operations of natural processes, leading to a first big star in 
1960. Interestingly, the molecular noise (pairwise, but disjunctive publication, 
not sharing publications with others) is present the whole time, continuously 
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staying on a similar percentage. Furthermore, the years of e.g. 1961, 1975, 1978, 
1991, and 1993 are of specific notice, as they do not hold a big star, but signifying 
even a evolutionary step from one research topic to another. Moreover, the latest 
years of 1995 to 2005 appear continuously without a dominating molecules, which 
might be the result of an evolving internet with lots of potential in many areas. 
And especially this vivid and colorful landscape enforces the scientific community 
to a multitude of research activities, finally shown in publications to different 
areas. On the other side, the amount of human researchers might be grown up, 
the possibilities of electronically publish a work has been tremendously increased. 




Fig. 11. Selected molecular communities, showing the years from 1970 to 1999. 
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A yearly flight over the association landscape between 1970 and 1999 yields 
on a results as presented in Figure [TT] The first years are characterized by an 
alternating appearance of the big star (two consecutive years) and one year of 
restructuring. This is, for example, in 1975, 1978, and 1981. Interestingly, the 
research years where Artificial Intelligence had become significantly could be 
characterized by the social communities between 1982 and 1990, dominating the 
publication landscape with less space for other social communities. In contrast 
to this, the social communities in the 1990's not generally concern with one 
social domain but stay manifold and distributed, sharing more simple molecular 
structures than in the years before. 



5 Conclusions 

We have focused on entries of the bibliographic communities DBLP and charac- 
terized communities through a simple typifying description model. We have set 
a publication as a transaction between its associated authors, the general idea 
is to concern with directed associative relationships amongst them, to decom- 
pose each pattern to the fundamental molecular components, and to describe 
these communities by such atomic and molecular attributes. The decomposition 
supports the management of discovered structures towards the use of adaptive- 



incremental mind-maps (Figure 12 ), being discovered molecular structures at the 
associative memory layer and firstly managed in the short-term memory. Un- 
derstanding bibliographic entries as data stream input, this is an important step 
towards the interpretation of (temporal) social communities as informational 
and intermediate results. 
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